Differential dynamic programming

Short description: Algorithm for trajectory optimization

Differential dynamic programming (DDP) is an optimal control algorithm of the trajectory optimization class. The algorithm was introduced in 1966 by Mayne^[1] and subsequently analysed in Jacobson and Mayne's eponymous book.^[2] The algorithm uses locally-quadratic models of the dynamics and cost functions, and displays quadratic convergence. It is closely related to Pantoja's step-wise Newton's method.^[3]^[4]

Finite-horizon discrete-time problems

The dynamics

$𝐱_{i + 1} = 𝐟 (𝐱_{i}, 𝐮_{i})$

(1)

describe the evolution of the state $𝐱$ given the control $𝐮$ from time $i$ to time $i + 1$ . The total cost $J_{0}$ is the sum of running costs $ℓ$ and final cost $ℓ_{f}$ , incurred when starting from state $𝐱$ and applying the control sequence $𝐔 \equiv {𝐮_{0}, 𝐮_{1} \dots, 𝐮_{N - 1}}$ until the horizon is reached:

J_{0} (𝐱, 𝐔) = \sum_{i = 0}^{N - 1} ℓ (𝐱_{i}, 𝐮_{i}) + ℓ_{f} (𝐱_{N}),

where $𝐱_{0} \equiv 𝐱$ , and the $𝐱_{i}$ for $i > 0$ are given by Eq. 1. The solution of the optimal control problem is the minimizing control sequence $𝐔^{*} (𝐱) \equiv {argmin}_{𝐔} J_{0} (𝐱, 𝐔) .$ Trajectory optimization means finding $𝐔^{*} (𝐱)$ for a particular $𝐱_{0}$ , rather than for all possible initial states.

Dynamic programming

Let $𝐔_{i}$ be the partial control sequence $𝐔_{i} \equiv {𝐮_{i}, 𝐮_{i + 1} \dots, 𝐮_{N - 1}}$ and define the cost-to-go $J_{i}$ as the partial sum of costs from $i$ to $N$ :

J_{i} (𝐱, 𝐔_{i}) = \sum_{j = i}^{N - 1} ℓ (𝐱_{j}, 𝐮_{j}) + ℓ_{f} (𝐱_{N}) .

The optimal cost-to-go or value function at time $i$ is the cost-to-go given the minimizing control sequence:

V (𝐱, i) \equiv \min_{𝐔_{i}} J_{i} (𝐱, 𝐔_{i}) .

Setting $V (𝐱, N) \equiv ℓ_{f} (𝐱_{N})$ , the dynamic programming principle reduces the minimization over an entire sequence of controls to a sequence of minimizations over a single control, proceeding backwards in time:

$V (𝐱, i) = \min_{𝐮} [ℓ (𝐱, 𝐮) + V (𝐟 (𝐱, 𝐮), i + 1)] .$

(2)

This is the Bellman equation.

Differential dynamic programming

DDP proceeds by iteratively performing a backward pass on the nominal trajectory to generate a new control sequence, and then a forward-pass to compute and evaluate a new nominal trajectory. We begin with the backward pass. If

ℓ (𝐱, 𝐮) + V (𝐟 (𝐱, 𝐮), i + 1)

is the argument of the $\min [\cdot]$ operator in Eq. 2, let $Q$ be the variation of this quantity around the $i$ -th $(𝐱, 𝐮)$ pair:

\begin{aligned} Q (δ 𝐱, δ 𝐮) \equiv & ℓ (𝐱 + δ 𝐱, 𝐮 + δ 𝐮) & + V (𝐟 (𝐱 + δ 𝐱, 𝐮 + δ 𝐮), i + 1) \\ - & ℓ (𝐱, 𝐮) & - V (𝐟 (𝐱, 𝐮), i + 1) \end{aligned}

and expand to second order

$\approx \frac{1}{2} {[\begin{matrix} 1 \\ δ 𝐱 \\ δ 𝐮 \end{matrix}]}^{T} [\begin{matrix} 0 & Q_{𝐱}^{T} & Q_{𝐮}^{T} \\ Q_{𝐱} & Q_{𝐱 𝐱} & Q_{𝐱 𝐮} \\ Q_{𝐮} & Q_{𝐮 𝐱} & Q_{𝐮 𝐮} \end{matrix}] [\begin{matrix} 1 \\ δ 𝐱 \\ δ 𝐮 \end{matrix}]$

(3)

The $Q$ notation used here is a variant of the notation of Morimoto where subscripts denote differentiation in denominator layout.^[5] Dropping the index $i$ for readability, primes denoting the next time-step $V^{'} \equiv V (i + 1)$ , the expansion coefficients are

\begin{aligned} Q_{𝐱} & = ℓ_{𝐱} + 𝐟_{𝐱}^{T} V'_{𝐱} \\ Q_{𝐮} & = ℓ_{𝐮} + 𝐟_{𝐮}^{T} V'_{𝐱} \\ Q_{𝐱 𝐱} & = ℓ_{𝐱 𝐱} + 𝐟_{𝐱}^{T} V'_{𝐱 𝐱} 𝐟_{𝐱} + {V_{𝐱}}^{'} \cdot 𝐟_{𝐱 𝐱} \\ Q_{𝐮 𝐮} & = ℓ_{𝐮 𝐮} + 𝐟_{𝐮}^{T} V'_{𝐱 𝐱} 𝐟_{𝐮} + V'_{𝐱} \cdot 𝐟_{𝐮 𝐮} \\ Q_{𝐮 𝐱} & = ℓ_{𝐮 𝐱} + 𝐟_{𝐮}^{T} V'_{𝐱 𝐱} 𝐟_{𝐱} + V'_{𝐱} \cdot 𝐟_{𝐮 𝐱} . \end{aligned}

The last terms in the last three equations denote contraction of a vector with a tensor. Minimizing the quadratic approximation (3) with respect to $δ 𝐮$ we have

${δ 𝐮}^{*} = \underset{δ 𝐮}{argmin} Q (δ 𝐱, δ 𝐮) = - Q_{𝐮 𝐮}^{- 1} (Q_{𝐮} + Q_{𝐮 𝐱} δ 𝐱),$

(4)

giving an open-loop term $𝐤 = - Q_{𝐮 𝐮}^{- 1} Q_{𝐮}$ and a feedback gain term $𝐊 = - Q_{𝐮 𝐮}^{- 1} Q_{𝐮 𝐱}$ . Plugging the result back into (3), we now have a quadratic model of the value at time $i$ :

\begin{aligned} Δ V (i) & = & - \frac{1}{2} Q_{𝐮}^{T} Q_{𝐮 𝐮}^{- 1} Q_{𝐮} \\ V_{𝐱} (i) & = Q_{𝐱} & - Q_{𝐱 𝐮} Q_{𝐮 𝐮}^{- 1} Q_{𝐮} \\ V_{𝐱 𝐱} (i) & = Q_{𝐱 𝐱} & - Q_{𝐱 𝐮} Q_{𝐮 𝐮}^{- 1} Q_{𝐮 𝐱} . \end{aligned}

Recursively computing the local quadratic models of $V (i)$ and the control modifications ${𝐤 (i), 𝐊 (i)}$ , from $i = N - 1$ down to $i = 1$ , constitutes the backward pass. As above, the Value is initialized with $V (𝐱, N) \equiv ℓ_{f} (𝐱_{N})$ . Once the backward pass is completed, a forward pass computes a new trajectory:

\begin{aligned} \hat{𝐱} (1) & = 𝐱 (1) \\ \hat{𝐮} (i) & = 𝐮 (i) + 𝐤 (i) + 𝐊 (i) (\hat{𝐱} (i) - 𝐱 (i)) \\ \hat{𝐱} (i + 1) & = 𝐟 (\hat{𝐱} (i), \hat{𝐮} (i)) \end{aligned}

The backward passes and forward passes are iterated until convergence. If the Hessians $Q_{𝐱 𝐱}, Q_{𝐮 𝐮}, Q_{𝐮 𝐱}, Q_{𝐱 𝐮}$ are replaced by their Gauss-Newton approximation, the method reduces to the iterative Linear Quadratic Regulator (iLQR).^[6]

Regularization and line-search

Differential dynamic programming is a second-order algorithm like Newton's method. It therefore takes large steps toward the minimum and often requires regularization and/or line-search to achieve convergence.^[7]^[8] Regularization in the DDP context means ensuring that the $Q_{𝐮 𝐮}$ matrix in Eq. 4 is positive definite. Line-search in DDP amounts to scaling the open-loop control modification $𝐤$ by some $0 < α < 1$ .

Monte Carlo version

Sampled differential dynamic programming (SaDDP) is a Monte Carlo variant of differential dynamic programming.^[9]^[10]^[11] It is based on treating the quadratic cost of differential dynamic programming as the energy of a Boltzmann distribution. This way the quantities of DDP can be matched to the statistics of a multidimensional normal distribution. The statistics can be recomputed from sampled trajectories without differentiation.

Sampled differential dynamic programming has been extended to Path Integral Policy Improvement with Differential Dynamic Programming.^[12] This creates a link between differential dynamic programming and path integral control,^[13] which is a framework of stochastic optimal control.

Constrained problems

Interior Point Differential dynamic programming (IPDDP) is an interior-point method generalization of DDP that can address the optimal control problem with nonlinear state and input constraints.^[14]

References

↑ Mayne, D. Q. (1966). "A second-order gradient method of optimizing non-linear discrete time systems". Int J Control 3: 85–95. doi:10.1080/00207176608921369.
↑ Mayne, David Q.; Jacobson, David H. (1970). Differential dynamic programming. New York: American Elsevier Pub. Co.. ISBN 978-0-444-00070-5. https://books.google.com/books?id=tA-oAAAAIAAJ.
↑ de O. Pantoja, J. F. A. (1988). "Differential dynamic programming and Newton's method". International Journal of Control 47 (5): 1539–1553. doi:10.1080/00207178808906114. ISSN 0020-7179.
↑ Liao, L. Z. (1992). Advantages of differential dynamic programming over Newton's method for discrete-time optimal control problems. https://hdl.handle.net/1813/5474.
↑ Morimoto, J.; G. Zeglin; C.G. Atkeson (2003). "Minimax differential dynamic programming: Application to a biped walking robot". 2. pp. 1927–1932.
↑ Baumgärtner, K. (2023). "A Unified Local Convergence Analysis of Differential Dynamic Programming, Direct Single Shooting, and Direct Multiple Shooting". 2023 European Control Conference (ECC). pp. 1–7. doi:10.23919/ECC57647.2023.10178367. https://ieeexplore.ieee.org/document/10178367.
↑ Liao, L. Z; C. A Shoemaker (1991). "Convergence in unconstrained discrete-time differential dynamic programming". IEEE Transactions on Automatic Control 36 (6): 692. doi:10.1109/9.86943.
↑ Tassa, Y. (2011). Theory and implementation of bio-mimetic motor controllers (PDF) (Thesis). Hebrew University. Archived from the original (PDF) on 2016-03-04. Retrieved 2012-02-27.
↑ "Sampled differential dynamic programming" (in en-US). doi:10.1109/IROS.2016.7759229.
↑ Rajamäki, Joose; Hämäläinen, Perttu (June 2018). "Regularizing Sampled Differential Dynamic Programming - IEEE Conference Publication" (in en-US). 2018 Annual American Control Conference (ACC). pp. 2182–2189. doi:10.23919/ACC.2018.8430799. https://ieeexplore.ieee.org/document/8430799. Retrieved 2018-10-19.
↑ Rajamäki, Joose (2018) (in en). Random Search Algorithms for Optimal Control. Aalto University. ISBN 978-952-60-8156-4. http://urn.fi/URN:ISBN:978-952-60-8156-4.
↑ Lefebvre, Tom; Crevecoeur, Guillaume (July 2019). "Path Integral Policy Improvement with Differential Dynamic Programming". 2019 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM). pp. 739–745. doi:10.1109/AIM.2019.8868359. ISBN 978-1-7281-2493-3. https://ieeexplore.ieee.org/document/8868359.
↑ Theodorou, Evangelos; Buchli, Jonas; Schaal, Stefan (May 2010). "Reinforcement learning of motor skills in high dimensions: A path integral approach". 2010 IEEE International Conference on Robotics and Automation. pp. 2397–2403. doi:10.1109/ROBOT.2010.5509336. ISBN 978-1-4244-5038-1.
↑ Pavlov, Andrei; Shames, Iman; Manzie, Chris (2020). "Interior Point Differential Dynamic Programming". IEEE Transactions on Control Systems Technology 29 (6): 2720. doi:10.1109/TCST.2021.3049416. Bibcode: 2021ITCST..29.2720P.

External links

The open-source software framework acados provides an efficient and embeddable implementation of DDP.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Differential dynamic programming. Read more

[1] Mayne, D. Q. (1966). "A second-order gradient method of optimizing non-linear discrete time systems". Int J Control 3: 85–95. doi:10.1080/00207176608921369.

[2] Mayne, David Q.; Jacobson, David H. (1970). Differential dynamic programming. New York: American Elsevier Pub. Co.. ISBN 978-0-444-00070-5. https://books.google.com/books?id=tA-oAAAAIAAJ.

[3] O. Pantoja, J. F. A. (1988). "Differential dynamic programming and Newton's method". International Journal of Control 47 (5): 1539–1553. doi:10.1080/00207178808906114. ISSN 0020-7179.

[4] Liao, L. Z. (1992). Advantages of differential dynamic programming over Newton's method for discrete-time optimal control problems. https://hdl.handle.net/1813/5474.

[5] Morimoto, J.; G. Zeglin; C.G. Atkeson (2003). "Minimax differential dynamic programming: Application to a biped walking robot". 2. pp. 1927–1932.

[6] Baumgärtner, K. (2023). "A Unified Local Convergence Analysis of Differential Dynamic Programming, Direct Single Shooting, and Direct Multiple Shooting". 2023 European Control Conference (ECC). pp. 1–7. doi:10.23919/ECC57647.2023.10178367. https://ieeexplore.ieee.org/document/10178367.

[7] Liao, L. Z; C. A Shoemaker (1991). "Convergence in unconstrained discrete-time differential dynamic programming". IEEE Transactions on Automatic Control 36 (6): 692. doi:10.1109/9.86943.

[8] Tassa, Y. (2011). Theory and implementation of bio-mimetic motor controllers (PDF) (Thesis). Hebrew University. Archived from the original (PDF) on 2016-03-04. Retrieved 2012-02-27.

[9] "Sampled differential dynamic programming" (in en-US). doi:10.1109/IROS.2016.7759229.

[10] Rajamäki, Joose; Hämäläinen, Perttu (June 2018). "Regularizing Sampled Differential Dynamic Programming - IEEE Conference Publication" (in en-US). 2018 Annual American Control Conference (ACC). pp. 2182–2189. doi:10.23919/ACC.2018.8430799. https://ieeexplore.ieee.org/document/8430799. Retrieved 2018-10-19.

[11] Rajamäki, Joose (2018) (in en). Random Search Algorithms for Optimal Control. Aalto University. ISBN 978-952-60-8156-4. http://urn.fi/URN:ISBN:978-952-60-8156-4.

[12] Lefebvre, Tom; Crevecoeur, Guillaume (July 2019). "Path Integral Policy Improvement with Differential Dynamic Programming". 2019 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM). pp. 739–745. doi:10.1109/AIM.2019.8868359. ISBN 978-1-7281-2493-3. https://ieeexplore.ieee.org/document/8868359.

[13] Theodorou, Evangelos; Buchli, Jonas; Schaal, Stefan (May 2010). "Reinforcement learning of motor skills in high dimensions: A path integral approach". 2010 IEEE International Conference on Robotics and Automation. pp. 2397–2403. doi:10.1109/ROBOT.2010.5509336. ISBN 978-1-4244-5038-1.

[14] Pavlov, Andrei; Shames, Iman; Manzie, Chris (2020). "Interior Point Differential Dynamic Programming". IEEE Transactions on Control Systems Technology 29 (6): 2720. doi:10.1109/TCST.2021.3049416. Bibcode: 2021ITCST..29.2720P.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

Anonymous

Search

Differential dynamic programming

Namespaces

More

Page actions

Contents

Finite-horizon discrete-time problems

Dynamic programming

Differential dynamic programming

Regularization and line-search

Monte Carlo version

Constrained problems

See also

References

External links

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Differential dynamic programming

Finite-horizon discrete-time problems

Dynamic programming

Differential dynamic programming

Regularization and line-search

Monte Carlo version

Constrained problems

See also

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories